Optimum Cluster Labeling and Document Clustering for Forensic Analysis

نویسندگان

  • Shamrao Adagale
  • Shubhangi Sagar Vairagar
  • Amrit Priyadarshi
چکیده

Document clustering or unsupervised document classification is an automated process of grouping documents with similar content. Document clustering is an important task in many Information Retrieval systems. Also document clustering Algorithms can help in discovery of new and useful knowledge or novel class from the documents under analysis. This knowledge or novel class is very important issue while handling forensic analysis. Digital Forensic Investigation is the branch of scientific forensic process for investigation of material found in digital devices related to computer crimes. In computer forensics, hundreds of thousands of files per computer are examined. Hence methods for automated data analysis, such as clustering are required. Labeling large data sets with clusters bridges the effective cluster analysis to the large data set. Labeling irregular shaped clusters, distinguishing outliers and extending cluster boundary are the main problems in this stage. We address these problems and propose a cluster labeling algorithm which is very intuitive and easy to use

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Approach for Improving Computer Inspections by Using Fuzzy Methods for Forensic Data Analysis

Now a day’s digital world data in computers has great significance and this data is extremely critical in perspective for upcoming position and learn irrespective of different fields. Therefore we assessment of such data is vital and imperative task. Computer forensic analysis a lot of data there in the digital campaign is study to extract data and computers consist of hundreds of thousands of ...

متن کامل

An Efficient Technique to Improve Snippet Clustering

Document clustering is an effective tool to manage information overload. By grouping similar documents together, we enable a human observer to quickly browse large document collections, make it possible to easily grasp the distinct topics and subtopics. In this Paper we survey the most important problems and techniques related to text information retrieval: document pre-processing and filtering...

متن کامل

Subject-based semantic document clustering for digital forensic investigations

Computers are increasingly used as tools to commit crimes such as unauthorized access (hacking), drug trafficking, and child pornography. The proliferation of crimes involving computers has created a demand for special forensic tools that allow investigators to look for evidence on a suspect’s computer by analyzing communications and data on the computer’s storage devices. Motivated by the fore...

متن کامل

Clustering Technique in Multi-Document Personal Name Disambiguation

Focusing on multi-document personal name disambiguation, this paper develops an agglomerative clustering approach to resolving this problem. We start from an analysis of pointwise mutual information between feature and the ambiguous name, which brings about a novel weight computing method for feature in clustering. Then a trade-off measure between within-cluster compactness and among-cluster se...

متن کامل

Automatic Labeling of Document Clusters

Automatically labeling document clusters with words which indicate their topics is difficult to do well. The most commonly used method, labeling with the most frequent words in the clusters, ends up using many words that are virtually void of descriptive power even after traditional stop words are removed. Another method, labeling with the most predictive words, often includes rather obscure wo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014